Systems and methods for robust low-rank matrix approximation

ABSTRACT

Systems and methods which provide robust low-rank matrix approximation using low-rank matrix factorization in the lp-norm space, where p&lt;2 (e.g., 1≤p&lt;2), providing a lp-PCA technique are described. For example, embodiments are configured to provide robust low-rank matrix approximation using low-rank matrix factorization in the least absolute deviation (l1-norm) space providing a l1-PCA technique. Embodiments minimize the lp-norm of the residual matrix in the subspace factorization of an observed data matrix, such as to minimize the l1-norm of the residual matrix where p=1. The alternating direction method of multipliers (ADMM) is applied according to embodiments to solve the subspace decomposition of the low-rank matrix factorization with respect to the observed data matrix. Iterations of the ADMM may comprise solving a l2-subspace decomposition and calculating the proximity operator of the l1-norm.

TECHNICAL FIELD

The invention relates generally to extraction of principal componentsfrom observed data matrices and, more particularly, to robust low-rankmatrix approximation of observed data matrices.

BACKGROUND OF THE INVENTION

Many real-world signals, such as textual, visual, audio, and financialdata, lie near some low-dimensional subspace. That is, the matricesconstructed using these observations (referred to herein as observeddata matrices) as column vectors are often of relatively low rank, andthus data of many such real-world signals can be approximated bymatrices whose ranks are much smaller than their column and row lengths(i.e., low-rank matrix approximations of observed data matrices).Accordingly, low-rank matrix approximation, wherein the fit between agiven observed data matrix and an approximation matrix is minimized, maybe used for mathematical modeling, data compression, etc. with respectto the data of such signals. The purpose of low-rank matrixapproximation is to extract the low-dimensional subspaces or principalcomponents of the observed data matrices constructed from the signals.Such low-rank matrix approximation has, for example, been a core task inmany important areas including dimensionality reduction, computervision, machine learning and signal processing, especially withhigh-dimensional datasets.

Principal component analysis (PCA) is a standard tool to seek the bestlow-rank representation of a given observation data matrix in the leastsquares (l₂-norm) space. PCA, which can be computed via truncatedsingular value decomposition (SVD), is a linear transformation thatrigidly rotates the coordinates of a given set of data, so as tomaximize the variance of the data in each of the new dimensions insuccession. As a result, using PCA, it is possible to describe the datain a lower-dimensional space, retaining only the first few principalcomponents, and discarding the rest. However, conventional PCA does notwork well in the presence of impulsive noise (e.g., non-Gaussiandisturbances) and/or outliers in the observation data matrix. This isbecause the design of conventional PCA techniques utilizes least squaresor the l₂-norm minimization with respect to the observation data matrix,and the SVD cannot be generalized to the l_(p)-norm except for p=2,indicating that such conventional PCA (referred to herein as l₂-PCA)techniques are well suited only for additive Gaussian noise.

An existing method for robust low-rank matrix approximation isalternating convex optimization (ACO). In the ACO, the objectivefunction is minimized over one factored matrix while the other factor isfixed in an iterative manner. Subspace estimation performance of the ACOis less than desired and computational complexity of the ACO is high.Accordingly, the use of ACO for robust low-rank matrix approximation isunsatisfactory with respect to some scenarios.

As can be appreciated from the foregoing, existing solutions forproviding low-rank approximation of observed data matrices, such as PCA,are not robust with respect to impulsive noise and outlier points in theobserved data matrix. The existing solutions for providing robustlow-rank approximation of observed data matrices, such as ACO, can workin the impulsive noise environments but it is more computationallydemanding and have somewhat poor performance in subspace estimation.

BRIEF SUMMARY OF THE INVENTION

The present invention is directed to systems and methods which providerobust low-rank matrix approximation using low-rank matrix factorizationin the l_(p)-norm space, where p<2 (e.g., 1≤p<2), (referred to herein asl_(p)-PCA). For example, embodiments are configured to provide robustlow-rank matrix approximation using low-rank matrix factorization in theleast absolute deviation (l₁-norm) space (referred to herein as l₁-PCA).

Embodiments of the invention minimize the l_(p)-norm of the residualmatrix in the subspace factorization of an observed data matrix. Forexample, embodiments of a l₁-PCA configuration, where p=1, operate tominimize the l₁-norm of the residual matrix in the subspacefactorization of an observed data matrix, in contrast to the leastsquares (l₂-norm) space factorization and minimization commonly used byPCA (e.g., the aforementioned l₂-PCA). The alternating direction methodof multipliers (ADMM) is applied according to embodiments to solve thesubspace decomposition of the low-rank matrix factorization with respectto the observed data matrix. By way of example, the ADMM may be appliedin operation of embodiments of a l₁-PCA configuration to solve thel₁-norm subspace decomposition of the low-rank matrix factorization withrespect to the observed data matrix. Iterations of the ADMM may comprisesolving a l₂-subspace decomposition (e.g., using the least Frobeniusnorm solved by the truncated SVD) and calculating the proximity operatorof the l₁-norm (e.g., using a closed-form soft-thresholding operator forcomplex variables).

As can be seen from the foregoing, embodiments of the present inventionimplement a l_(p)-PCA (e.g., a least absolute deviation or l₁-norm PCA)technique for low-rank matrix approximation of observed data matrices.Implementations of such a l_(p)-PCA technique may utilize a user-definedparameter in the form of the target rank of the low-rank matrixcomponent. Such a target rank of the low-rank matrix component isgenerally readily determinable in practical applications (e.g., videosurveillance, machine learning, web search, bioinformatics,dimensionality reduction, signal processing, etc.).

It should be appreciated that the low-rank matrix approximationsprovided using l_(p)-PCA techniques of embodiments are robust withrespect to impulsive noise and outlier points in the observed datamatrix in light of the l_(p)-norm space, where p<2, of the low-rankmatrix factorization of a l_(p)-PCA technique being resistant toimpulsive noise and outliers. Moreover, the application of ADMM to solvethe l_(p)-norm subspace decomposition according to embodiments of al_(p)-PCA technique improves the numerical performance compared with thealternating minimization (e.g., ACO providing minimization of anobjective function over one factored matrix while another factor isfixed). The low-rank matrix factorization of a l_(p)-PCA technique ofthe present invention is superior to the ACO in terms of robust subspaceestimation performance and computational complexity. For example, al₁-PCA technique of embodiments converges to a superior solution thathas smaller objective function value and more accurate subspaceestimation than that of the conventional ACO technique.

Embodiments of the present invention provide a number of advantages overexisting low-rank matrix approximation technologies, such as theaforementioned l₂-PCA technique. For example, embodiments provide amatrix factorization approach configured for low-rank matrixapproximation in the presence of impulsive noise, outliers, anomalies,or sparse features. The robust l_(p)-norm minimization utilizedaccording to embodiments of a l_(p)-PCA technique herein works well insuch environments while the conventional l₂-PCA techniques, includingthe PCA which is based on l₂-norm minimization, fail to operate. The useof ADMM for solving the subspace decomposition according to l_(p)-PCAembodiments, provides superior comparative performance which isdemonstrated via the application of source localization in impulsivenoise environment.

The foregoing has outlined rather broadly the features and technicaladvantages of the present invention in order that the detaileddescription of the invention that follows may be better understood.Additional features and advantages of the invention will be describedhereinafter which form the subject of the claims of the invention. Itshould be appreciated by those skilled in the art that the conceptionand specific embodiment disclosed may be readily utilized as a basis formodifying or designing other structures for carrying out the samepurposes of the present invention. It should also be realized by thoseskilled in the art that such equivalent constructions do not depart fromthe spirit and scope of the invention as set forth in the appendedclaims. The novel features which are believed to be characteristic ofthe invention, both as to its organization and method of operation,together with further objects and advantages will be better understoodfrom the following description when considered in connection with theaccompanying figures. It is to be expressly understood, however, thateach of the figures is provided for the purpose of illustration anddescription only and is not intended as a definition of the limits ofthe present invention.

BRIEF DESCRIPTION OF THE DRAWING

For a more complete understanding of the present invention, reference isnow made to the following descriptions taken in conjunction with theaccompanying drawing, in which:

FIG. 1 shows a flow diagram illustrating operation according to al_(p)-PCA technique for low-rank matrix approximation of observed datamatrices according to embodiments of the invention;

FIG. 2 shows a processor-based system configured for implementing al_(p)-PCA technique of embodiments of the invention;

FIG. 3 shows pseudocode providing logic implementing the ADMM of al_(p)-PCA technique for low-rank matrix approximation according toembodiments of the invention;

FIG. 4 shows a graph of the difference in a conventional ACO objectivefunction and an ADMM objective function according to embodiments of theinvention;

FIG. 5 shows graphs of normalized decrease of objective function versusiteration number for a conventional ACO scheme and ADMM schemesaccording to embodiments of the invention;

FIG. 6 shows graphs of normalized residual of ADMM versus iterationnumber in accordance with embodiments of the invention;

FIG. 7 shows graphs of subspace distance versus iteration number for aconventional ACO scheme and ADMM schemes according to embodiments of theinvention;

FIG. 8 shows graphs of root mean square error (RMSE) of subspacedistance versus signal-to-noise ratio (SNR) for Gaussian mixture (GM)noise for various conventional schemes and an ADMM scheme according toembodiments of the invention;

FIG. 9 shows graphs of RMSE of direction-of-arrival (DOA) of firstsource versus SNR for GM noise for various conventional schemes and anADMM scheme according to embodiments of the invention;

FIG. 10 shows graphs of RMSE of subspace distance versus SNR for GGDnoise for various conventional schemes and an ADMM scheme according toembodiments of the invention;

FIG. 11 shows graphs of RMSE of DOA of first source versus SNR forgeneralized Gaussian distribution (GGD) noise for various conventionalschemes and an ADMM scheme according to embodiments of the invention;

FIGS. 12A-12C show results of a l_(p)-PCA technique employed in theapplication of texture impainting; and

FIGS. 13A-13C and 14A-14C show results l_(p)-PCA techniques employed inthe application of video background extraction.

DETAILED DESCRIPTION OF THE INVENTION

Observed data may be represented as the matrix M∈

^(n) ¹ ^(×n) ² . For example, the matrix M may comprise a matrix of aplurality of data points (e.g., one per row), a single object (e.g., arectangular image with the matrix entries being pixel intensities), etc.Low-rank matrix approximation provides a lossy, compressed version ofthe observed data matrix. Such low-rank matrix approximation may, forexample, be utilized in many application areas, such as dimensionalityreduction, computer vision, machine learning, signal processing, etc.

Embodiments of the present invention implement low-rank matrixfactorization in the l_(p)-norm space (i.e., function spaces definedusing a natural generalization of the p-norm for finite-dimensionalvector spaces), where p<2 (e.g., 1≤p<2), (referred to herein asl_(p)-PCA, it being understood that l_(p)-PCA as used herein refers tocases that employ the l_(p)-norm cost function with p<2) for providingrobust low-rank matrix approximation. For example, where p=1,embodiments implement a least absolute deviation or l₁-norm PCA(referred to herein as l₁-PCA) technique for low-rank matrixapproximation of observed data matrices. A l_(p)-PCA technique forlow-rank matrix approximation of observed data matrices of embodimentsherein is configured to extract the low-rank matrix or principalcomponents from an observed data matrix, possibly with impulsive noise,outliers, or sparse features.

In operation according to embodiments, a l_(p)-PCA technique forlow-rank matrix approximation of observed data matrices provides thelow-rank matrix approximation using low-rank matrix factorization in thesubspace by minimizing the l_(p)-norm of the residual matrix in thesubspace factorization of an observed data matrix. For example,embodiments of a l₁-PCA configuration, where p=1, operate to minimizethe l₁-norm of the residual matrix in the subspace factorization of anobserved data matrix. The alternating direction method of multipliers(ADMM) is applied according to embodiments to solve the subspacedecomposition of the low-rank matrix factorization with respect to theobserved data matrix, such as to solve the l₁-norm subspacedecomposition of the low-rank matrix factorization with respect to theobserved data matrix in a l₁-PCA configuration where p=1. The ADMMtechnique implemented in accordance with embodiments converts theminimization of a nonsmooth l_(p)-norm into a Frobenius normminimization at each iteration, such as may be efficiently solved by thetruncated SVD. Accordingly, iterations of the ADMM may comprise solvinga l₂-subspace decomposition and calculating the proximity operator ofthe l_(p)-norm.

FIG. 1 shows a flow diagram illustrating operation according to al_(p)-PCA technique for low-rank matrix approximation of observed datamatrices according to an exemplary embodiment of the invention. Thefunctions of flow 100 setting forth a l_(p)-PCA technique for low-rankmatrix approximation of the embodiment illustrated in FIG. 1 may, forexample, comprise logic implemented by operation of a processor-basedsystem, such as computer system 200 of FIG. 2. As one example, functionsof flow 100 may be provided as processor executable instructions storedin memory which, when executed by a processor, perform operations asdescribed herein. Accordingly, computer system 200 executinginstructions of the functions of flow 100 provides a processor-basedsystem configured for l_(p)-PCA low-rank matrix approximation accordingto embodiments of the present invention.

Computer system 200 may, for example, comprise a server system, apersonal computer, a laptop computer, a notebook computer, a tabletdevice, a smartphone, a personal digital assistant, an Internet ofThings (IoT) device, or other processor-based platform having sufficientprocessing power and resources for implementing functions of a l_(p)-PCAtechnique for low-rank matrix approximation of embodiments herein.Accordingly, computer system 200 of the illustrated embodiment includescentral processing unit (CPU) 201 coupled to system bus 202. CPU 201 maybe any general purpose CPU, such as a processor from the PENTIUM or COREfamily of processors available from Intel Corporation or a processorfrom the POWERPC family of processors available from the AIM alliance(Apple Inc., International Business Machines Corporation, and MotorolaInc.). However, the present invention is not restricted by thearchitecture of CPU 201 as long as CPU 201 supports the inventiveoperations as described herein.

Bus 202 of the illustrated embodiment is coupled to random access memory(RAM) 203, such as may comprise SRAM, DRAM, SDRAM, flash memory, and/orthe like. Read only memory (ROM) 204, such as may comprise PROM, EPROM,EEPROM, and/or the like, is also coupled to bus 202 of the illustratedembodiment. RAM 203 and ROM 204 hold user and system data and programsas is well known in the art. Bus 202 is also coupled to input/output(I/O) controller 205, communications adapter 211, user interface adapter208, and display adapter 209.

I/O controller 205 connects to storage device 206, such as may compriseone or more of a hard disk, an optical disk (e.g., compact disk (CD) ordigital versatile disk (DVD)), a floppy disk, and a tape, to thecomputer system. I/O controller 205 of the illustrated embodiment isalso connected to printer 214, which would allow the system to printinformation such as documents, photographs, etc. Such a printer may be atraditional printer (e.g., dot matrix, laser, etc.), a fax machine, acopy machine, and/or the like.

Communications adapter 211 is adapted to couple computer system 200 tonetwork 212 to provide communications to and/or from external systems,devices, networks, etc. Network 212 may comprise the public switchedtelephone network (PSTN), a local area network (LAN), a metropolitanarea network (MAN), a wide area network (WAN), an extranet, an intranet,the Internet, a cellular network, a cable transmission network, and/orthe like.

User interface adapter 208 of the illustrated embodiment couples varioususer input devices to the computer system. For example, keyboard 213,pointing device 207, and microphone 216 may be coupled through userinterface adapter to accept various forms of user input. Similarly,speakers 215 may be coupled through user interface adapter to provideuser interface output.

The display adapter 209 provides an interface to display 210.Accordingly, CPU 201 may control display of various information,including text, graphics, and images upon display 210 through displayadapter 209. Display 210 may comprise a cathode ray tube (CRT) display,a plasma display, a liquid crystal display (LCD), a touch screen, aprojector, and/or the like. Although not expressly shown in theillustrated embodiment, display 210 may provide for input of data aswell as output of data. For example, display 210 may comprise a touchscreen display according to embodiments of the invention.

When implemented in software, elements of embodiments of the presentinvention are essentially code segments operable upon a computer system,such as computer system 200, to perform the necessary tasks. The programor code segments can be stored in a computer readable medium, such asRAM 203, ROM 204, and/or storage device 206. Additionally oralternatively, the code segments may be downloaded via computernetworks, such as network 212.

Referring again to FIG. 1, flow 100 of the illustrated embodimentprovides robust low-rank matrix approximation of an observed data matrixusing low-rank matrix factorization in the l_(p)-norm space, where p<2(e.g., 1≤p<2). Accordingly, at block 101 of flow 100 the data of anobserved data matrix for which low-rank matrix approximation is to beprovided are obtained. The observed data matrix may, for example, bederived from various signals, such as textual, graphical, audio, video,multimedia, financial data, etc., wherein the observed data matrixcomprises a matrix of a plurality of data points (e.g., one per row)representing the signal. Additionally or alternatively, the observeddata matrix may be derived from an object, such as a rectangular imagewhere the matrix entries represent pixels of the image. Irrespective ofthe particular source and type of data of the observed data matrix, thedata of embodiments are such that they are desirable to be approximatedby a low-rank matrix and/or to extract principal components from thedata, such as in association with video surveillance, machine learning,web search, bioinformatics, dimensionality reduction, signal processing,etc.

Accordingly, block 102 of the illustrated embodiment provides low-rankmatrix approximation of the observed data matrix using low-rank matrixfactorization in l_(p)-norm space, where p<2. The low-rank matrixapproximations provided using l_(p)-PCA techniques implementing low-rankmatrix factorization in l_(p)-norm space in accordance with block 102 ofthe illustrated embodiment are robust with respect to impulsive noiseand outlier points in the observed data matrix.

In an exemplary embodiment using low-rank matrix factorization inl_(p)-norm space in operation according to block 102, p=1 and thelow-rank matrix factorization is realized in the l_(p)-space. Inoperation according to embodiments, minimization of the residual matrixin the subspace factorization of an observed data matrix may beimplemented by application of the ADMM to the observed data matrix. Infacilitating a l_(p)-PCA technique for low-rank matrix approximation byapplication of the ADMM to the observed data matrix, embodiments utilizea predetermined target rank for the low-rank matrix component.Accordingly, target rank 150, such as may comprise a user-definedparameter determined based upon the particular application in which thelow-rank matrix approximation is being provided, may be provided for usein the low-rank matrix factorization in l_(p)-norm space (e.g., forfinding the low-rank matrix approximation given the observed data matrixand the target rank for the low-rank matrix approximation).

For the case of 1<p<2, exemplary embodiments using low-rank matrixfactorization in l_(p)-norm space in operation according to block 102compute the proximity operator of the pth power of the l_(p)-norm. Thisproblem is separable and can be decomposed into n₁n₂ independent scalarminimization problems as follows:

$\begin{matrix}{{\min\limits_{z}{g(z)}}:={{\frac{1}{2}{{z - y}}^{2}} + {\frac{1}{\mu}{{z}^{p}.}}}} & (1)\end{matrix}$

It should be appreciated that in equation (1) above subscripts andsuperscripts are omitted for presentation simplicity. For thecomplex-valued case, the scalar minimization problem, as set forth inequation (1), is a two-dimensional optimization problem, which needs toresort the gradient descent. That is, z may be updated by z←z−s∇g(z),where s is the step size which can be determined using a line searchprocedure and

${\nabla{g(z)}} = {{{z - y}} + {\frac{pz}{\mu}{{z}^{p - 2}.}}}$

The gradient descent iterative procedure is more time-consuming thanembodiments in which p=1.

The ADMM may use iterative steps for providing minimization of thel_(p)-norm of the residual matrix in the subspace factorization, asshown by iteration loop 120 in block 102 of the exemplary embodiment. Inoperation according to an embodiment implementing iteration loop 120,each iteration of the ADMM comprises solving a l₂-subspacedecomposition, such as by using the least Frobenius norm solved by thetruncated SVD, at block 121. Each iteration of the ADMM implementingiteration loop 120 further comprises calculating the proximity operatorof the l_(p)-norm, such as by using a closed-form soft-thresholdingoperator for complex variables, at block 122.

Having approximated the observed data matrix by deriving a correspondinglow-rank matrix estimate of the observed data matrix at block 102,processing according to the illustrated embodiment of flow 100 proceedsto block 103 wherein the low-rank matrix approximation and/or principalcomponents of the observed data matrix, as may be readily extracted fromthe low-rank matrix approximation, are provided for use in a particularapplication. For example, once the ADMM converges upon a solution forthe subspace estimation, the results may be provided to one or moreprocess of computer system 200 for use in one or more applications, suchas surveillance, machine learning, web search, bioinformatics,dimensionality reduction, signal processing, etc. Additionally oralternatively, the results may be output, such as to an external systemor a user, for ultimate consumption.

It should be appreciated that operation of a processor-based system,such as computer system 200, configured for low-rank matrixapproximation in the l_(p)-norm space, where p<2, (e.g., l₁-PCA)according to embodiments of the present invention is improved ascompared to operation of a processor-based system configured forconventional l₂-PCA low-rank matrix approximation. In particular, aprocessor-based system configured for low-rank matrix approximation inthe l_(p)-norm space, where p<2, in accordance with embodiments hereinprovide a matrix factorization configuration for low-rank matrixapproximation in the presence of impulsive noise, outliers, anomalies,and/or sparse features. Accordingly, embodiments of processor-basedsystems configured for low-rank matrix approximation in the i-normspace, where p<2, are operable to compute more accurate subspaceestimation that is robust with respect to impulsive noise, outlierpoints, anomalies, and/or sparse features in the observed data matrix(i.e., resistant to errors in the resulting estimation due to impulsivenoise, outlier points, anomalies, and/or sparse features). Moreover, aprocessor-based system configured for low-rank matrix approximation inthe l_(p)-norm space, where p=1, (i.e., l₁-PCA) in accordance withembodiments herein using the ADMM to solve the l₁-norm subspacedecomposition significantly improves the numerical performance comparedwith the ACO technique. Further, the computational cost and complexityof the computations performed in the l₁-PCA technique implemented by aprocessor-based system configured in accordance with the concepts hereinis much less than that required for the ACO technique.

Having described embodiments configured for low-rank matrixapproximation of observed data matrices in the l_(p)-norm space, wherep<2, (e.g., l₁-PCA), further details with respect to concepts employedby a l_(p)-PCA technique providing the low-rank matrix approximationusing low-rank matrix factorization in the l_(p)-norm space are providedbelow. In understanding implementation of a l_(p)-PCA technique forlow-rank matrix approximation of observed data matrices, such as inaccordance with flow 100 of the exemplary embodiment of FIG. 1, itshould be understood that the aforementioned observed data matrix formM∈

^(n) ¹ ^(×n) ² may be factored as:

M=L+Q,  (2)

where L is the low-rank matrix component having rank(L)=r<<min(n₁, n₂)and Q is the perturbation matrix which may contain additive disturbancespossibly with outlier values.

Accordingly, low-rank matrix approximation may be recast as the task offinding L given M and r.

The low-rank matrix L has low rank, and thus may be rewritten as:

L=UV,  (3)

where U is an n₁×r matrix represented as U∈

^(n) ¹ ^(×r) and V is an r×n₂ matrix represented as V∈

^(r×n) ² . Conventionally, U and V are estimated from the followingsquared error minimization:

$\begin{matrix}{{{\min\limits_{U,V}{f_{2}\left( {U,V} \right)}}:={{{UV} - M}}_{F}^{2}}{where}} & (4) \\{{M}_{F} = \left( {\sum\limits_{i = 1}^{n_{1}}\; {\sum\limits_{j = 1}^{n_{2}}\; {m_{i,j}}^{2}}} \right)^{1/2}} & (5)\end{matrix}$

is the Frobenius norm of M with m_(i,j) being its (i,j) entry. When allelements in Q are zero-mean white Gaussian variables, the Frobenius normminimization in equation (4) results in the maximum likelihood (ML)estimates of U and V.

According to the Eckart-Young theorem, the global solution of equation(4) can be obtained via the truncated SVD of M, although the objectivefunction ƒ₂(U,V) is nonconvex with respect to both U and V. Without lossof generality, assuming that n₂≥n₁, the SVD of M may be expressed as:

M=GΣH ^(H),  (6)

where G=[g₁ g₂ . . . g_(n) ₁ ]∈

^(n) ¹ ^(×n) ¹ and H=[h₁ h₂ . . . h_(n) ₂ ]∈

^(n) ² ^(×n) ² are orthonormal matrices (i.e., the columns or rows ofthe matrices are orthonormal vectors) whose columns are thecorresponding left and right singular vectors of M, respectively, whileΣ=diag(λ₁, λ₂, . . . , λ_(n) ₁ )∈

^(n) ¹ ^(×n) ¹ is the diagonal matrix of singular values of M withλ₁≥λ₂≥ . . . ≥λ_(n) ₁ . The maximum-likelihood (ML) estimates of U andV, denoted by Û and {circumflex over (V)}, are given by:

Û=G _(s) ,{circumflex over (V)}=Σ _(s) H _(s) ^(H),  (7)

where G_(s)=[g₁ g₂ . . . g_(r)]∈

^(n) ¹ ^(×r), Σ_(s)=diag(λ₁, λ₂, . . . , λ_(r))∈

^(r×r) and H_(s)=[h₁ h₂ . . . h_(r)]∈

^(n) ² ^(×r). According to equation (3), {circumflex over (L)} may becomputed as:

{circumflex over (L)}=Û{circumflex over (V)}.  (8)

It should be appreciated that the Frobenius norm is not a robust costfunction because it exploits the squared error. Thus, the performance ofequation (7) will degrade when the probability density function (PDF) ofthe noise is impulsive or Q contains outliers. Accordingly, the low-rankmatrix approximation implemented according to embodiments of the presentinvention employ the l_(p)-norm cost function with p<2 (e.g., 1≤p<2) torobustify equation (4) as:

$\begin{matrix}{{{\min\limits_{U,V}{f_{p}\left( {U,V} \right)}}:={{{UV} - M}}_{p}^{p}},} & (9)\end{matrix}$

For 1≤p<2, the element-wise l_(p)-norm may be defined as:

$\begin{matrix}{{M}_{p} = {\left( {\sum\limits_{i = 1}^{n_{1}}\; {\sum\limits_{j = 1}^{n_{2}}\; {m_{i,j}}^{p}}} \right)^{1/p}.}} & (10)\end{matrix}$

Equation (9) reduces to equation (4) at p=2, and therefore equation (9)may be said to generalize equation (4). Thus, as discussed above, thel_(p)-minimization of equation (9) with respect to U and V is anonconvex optimization problem. However, the SVD cannot be appliedexcept for p=2 (i.e., l₂-norm). Accordingly, embodiments of theinvention consider the l₁-minimization, that is:

$\begin{matrix}{{\min\limits_{U,V}{f\left( {U,V} \right)}}:={{{UV} - M}}_{1}} & (11)\end{matrix}$

and thus provide a l₁-PCA technique for low-rank matrix approximation ofobserved data matrices that is configured to extract the low-rank matrixor principal components from an observed data matrix with impulsivenoise, outliers, and/or sparse features. This is because, compared withother values of p∈(1, 2), the PCA technique with p=1 is more robust tooutliers and computationally much simpler.

Where Z=UV−M, equation (9) is equivalent to the linearly constrainedproblem:

$\begin{matrix}{{\min\limits_{U,V,Z}{Z}_{1}},{{s.t.\mspace{14mu} Z} = {{UV} - {M.}}}} & (12)\end{matrix}$

The augmented Lagrangian of equation (12) is:

$\begin{matrix}{{{\mathcal{L}_{\mu}\left( {U,V,Z,\Lambda} \right)} = {{Z}_{1} + {{Re}\left( {\langle{\Lambda,{{UV} - Z - M}}\rangle} \right)} + {\frac{\mu}{2}{{{UV} - Z - M}}_{F}^{2}}}},} & (13)\end{matrix}$

where the matrix Λ∈

^(n) ¹ ^(×n) ² contains the n₁n₂ Lagrange multipliers,

A,B

=Σ_(i)Σ_(j)a_(i,j)*b_(i,j) represents the inner product of twocomplex-valued matrices A and B, and μ>0 is the penalty parameter. Itshould be appreciated that the augmented Lagrangian reduces to theunaugmented one if μ=0. Furthermore, the selection of μ is flexible andmay be set as a fixed positive constant, although employing atime-varying μ at each iteration may improve the convergence somewhat inpractice.

The Lagrange multiplier method solves the constrained problem ofequation (12) by finding a saddle point of the augmented Lagrangian:

$\begin{matrix}{\min\limits_{U,V,Z}{\max\limits_{\Lambda}{\mathcal{L}_{\mu}\left( {U,V,Z,\Lambda} \right)}}} & (14)\end{matrix}$

Formulation (14) is a minimax problem where the primal variables {U, V,Z} and dual variable Λ aim at decreasing and increasing

(U, V, Z, A), respectively. The ADMM may be used to calculate the saddlepoint in formulation (14). In particular, the ADMM may calculate thesaddle point using the following iterative steps:

$\begin{matrix}{{\left( {U^{k + 1},V^{k + 1}} \right) = {\arg {\min\limits_{U,V}{\mathcal{L}_{\mu}\left( {U,V,Z^{k},\Lambda^{k}} \right)}}}};} & (15) \\{{{Z^{k + 1} = {\arg {\min\limits_{Z}{\mathcal{L}_{\mu}\left( {U^{k + 1},V^{k + 1},Z,\Lambda^{k}} \right)}}}};}{and}} & (16) \\{{\Lambda^{k + 1} = {\Lambda^{k} + {\mu \left( {{U^{k + 1}V^{k + 1}} - Z^{k + 1} - M} \right)}}},} & (17)\end{matrix}$

where {U^(k), V^(k), Z^(k), Λ^(k)} represent the estimation results atthe kth iteration.

It should be appreciated that the gradient of

_(μ)(U^(k+1), V^(k+1), Z^(k+1), Λ) with respect to Λ is:

$\begin{matrix}{{\frac{\partial{\mathcal{L}_{\mu}\left( {U^{k + 1},V^{k + 1},Z^{k + 1},\Lambda} \right)}}{\partial\Lambda^{*}} = {{U^{k + 1}V^{k + 1}} - Z^{k + 1} - M}},} & (18)\end{matrix}$

wherein the Wirtinger derivative is employed in equation (18) because Λis complex-valued. It can be seen from the foregoing that equation (17)adopts a gradient ascent with a step size μ to update the dual variableΛ. In operation according to the above iterative steps, the ADMM updates{U,V} and Z in an alternating or sequential fashion to circumvent thedifficulty in jointly minimizing with respect to the two primal blocks.

Equation (15) minimizes {U,V} simultaneously. As such, the aboveiterative steps employing equations (15)-(17) correspond to a two-blockADMM (i.e., the two blocks being {U,V} and Z) but not a three-block one.Thus, it should be appreciated that the ADMM implemented according toembodiments of a l₁-PCA technique for low-rank matrix approximation ofobserved data matrices does not have any divergence problems because theconvergence of two-block ADMM is guaranteed, although the cases withmore blocks may not necessarily be convergent.

The subproblem of equation (15) above may be reduced to an equivalentsolved by the truncated SVD. For example, defining

$\begin{matrix}{X^{k} = {Z^{k} - \frac{\Lambda^{k}}{\mu} + M}} & (19)\end{matrix}$

and ignoring the constant term independent on {U,V}, the subproblem ofequation (15) can be seen as an equivalent to the following Frobeniusnorm minimization problem:

$\begin{matrix}{\left( {U^{k + 1},V^{k + 1}} \right) = {\arg {\min\limits_{U,V}{{{UV} - X^{k}}}_{F}^{2}}}} & (20)\end{matrix}$

whose global minimizer can be obtained by the truncated SVD of X^(k).From equations (6)-(8):

U ^(k+1) =G _(s) ^(k) ,V ^(k+1)=Σ_(s) ^(k)(H _(s) ^(k))^(H).  (21)

where Σ_(s) ^(k)∈

₊ ^(r×r) is the diagonal matrix whose diagonal elements are the rdominant singular values of X^(k), while the columns of G_(s) ^(k)∈

^(n) ¹ ^(×r) and H_(s) ^(k)∈

^(n) ² ^(×r) are the corresponding left and right singular vectors,respectively. It should be appreciated that, because the complexity ofthe truncated SVD is

(n₁n₂r) and the matrix rank is r<<min(n₁, n₂), the computational cost inequation (21) is much less than that of the full SVD which requires acomplexity of

(max(n₁n₂, n₁ ²n₂)).

The subproblem of equation (16) may be simplified and expressed as:

$\begin{matrix}{{{\min\limits_{z}{\frac{1}{2}{{Z - Y^{k}}}_{F}^{2}}} + {\frac{1}{\mu}{Z}_{1}}}{where}} & (22) \\{Y^{k} = {{U^{k + 1}V^{k + 1}} + \frac{\Lambda^{k}}{\mu} - {M.}}} & (23)\end{matrix}$

The solution of equation (22) defines the proximity operator of thel₁-norm of a complex-valued matrix. Equation (22) is separable and thusmay be decomposed into n₁n₂ independent scalar minimization problems:

$\begin{matrix}{{{\min\limits_{z_{i,j}}{\frac{1}{2}{{z_{i,j} - y_{i,j}^{k}}}^{2}}} + {\frac{1}{\mu}{z_{i,j}}}},{1 \leq i \leq n_{1}},{1 \leq j \leq {n_{2}.}}} & (24)\end{matrix}$

The solution of equation (24) is the soft-thresholding operator forcomplex variables, which has the closed-form expression:

$\begin{matrix}{z_{i,j}^{k + 1} = {\frac{\max \left( {{{y_{i,j}^{k}} - {1/\mu}},0} \right)}{{\max \left( {{{y_{i,j}^{k}} - {1/\mu}},0} \right)} + {1/\mu}}{y_{i,j}^{k}.}}} & (25)\end{matrix}$

It should be appreciated that equation (25) generalizes the popularreal-valued soft-thresholding operator. In light of the foregoing, onlya marginal complexity of

(n₁n₂) is needed to update Z in operation according to embodimentsherein.

Reasons for the choice of p=1 with respect to the l_(p)-norm costfunction in implementations of the low-rank matrix approximationaccording to embodiments of the present invention can be appreciatedfrom the foregoing. In particular, the proximity operator of the l₁-normhas a simple closed-form solution while the l_(p)-norm with p≠1 doesnot. Although the proximity operator of the pth power of the l_(p)-normcan be exactly solved as it is a convex problem for 1<p<2, an iterativeprocedure is required as there is no closed-form expression, whichindicates a time-consuming and processor intensive task. Moreover, thesoft-thresholding shrinks the value larger than the threshold towards tozero, and thus outlier reduction is automatically achieved. Accordingly,the l₁-subspace decomposition is more robust against outliers thanemploying 1<p<2.

FIG. 3 shows pseudocode providing logic implementing the ADMM of al₁-PCA technique for low-rank matrix approximation in accordance withthe foregoing. As may be seen in pseudocode logic 300 of FIG. 3, theADMM converts the minimization of a nonsmooth l₁-norm into a Frobeniusnorm minimization at each iteration, which is then efficiently solved bythe truncated SVD. It should be appreciated that the additional cost forcomputing the soft-thresholding operator is quite marginal because ithas a simple closed-form solution. The residual:

R ^(k) =U ^(k) V ^(k) −Z ^(k) −M  (26)

reflects how well the current iterate satisfies the linear constraintand can be used to check for convergence. Specifically, the iterationsperformed by pseudocode logic 300 of the illustrated embodiment areterminated when the normalized Frobenius norm of the residual is lessthan a small tolerance parameter δ>0, that is:

$\begin{matrix}{\frac{{R^{k}}_{F}}{{M}_{P}} < {\delta.}} & (27)\end{matrix}$

It should be appreciated that the dominant complexity per iteration ofthe ADMM implemented according to embodiments of a l₁-PCA technique isthe truncated SVD calculation. Accordingly, the total complexity of theADMM of an embodiment of a l₁-PCA technique is

(n₁n₂r N_(ADMM)), where N_(ADMM) is the required number of iterations inthe algorithm. Typically, a value of several tens is enough for N_(ADMM)to attain robust subspace factorization.

Having described concepts employed by a l_(p)-PCA technique providingthe low-rank matrix approximation using low-rank matrix factorization inthe l_(p)-norm space, where 1≤p<2, various exemplary l_(p)-PCA techniqueapplications are provided below. In particular, l_(p)-PCA techniques asemployed in the applications of source localization, texture impainting,and video background extraction are provided below. These exemplaryl_(p)-PCA technique applications illustrate the excellent performance ofthe l_(p)-PCA in accordance with the concepts herein.

The convergence behavior and source localization performance in thepresence of impulsive noise of a l_(p)-PCA technique may be shown usingsynthesized data having impulsive noise. For example, two widely usedPDF models for impulsive noise, namely, Gaussian mixture (GM) andgeneralized Gaussian distribution (GGD), may be considered for runtimeevaluation of a l_(p)-PCA technique.

For the GM PDF model, each q_(i,j) is a two-term circular independent GMvariable and its PDF is:

$\begin{matrix}{{{p_{q}(q)} = {\sum\limits_{l = 1}^{2}\; {\frac{c_{l}}{{\pi\sigma}_{l}^{2}}{\exp \left( {- \frac{{q}^{2}}{\sigma_{l}^{2}}} \right)}}}},} & (28)\end{matrix}$

where c_(l)∈[0, 1] and σ_(l) ² are the probability and variance of thelth term, respectively, with c₁+c₂=1. If σ₂ ²>>σ₁ ² and c₂<c₁ areselected, large noise samples of variance σ₂ ² occurring with a smallerprobability c₂ are the outliers embedded in Gaussian background noise ofvariance σ₁ ². Thus, GM models the phenomenon well in the presence ofboth Gaussian noise and impulsive noise. The total variance of q_(i,j)is σ_(q) ²=c₁σ₁ ²+c₂σ₂ ². Setting σ₂ ²=100σ₁ ² and c₂=0.1 gives 10%outliers in the GM model.

The PDF of the circular zero-mean GGD with variance σ_(q) ² is:

$\begin{matrix}{{{p_{q}(q)} = {\frac{{\beta\Gamma}\left( {4/\beta} \right)}{2{\pi\sigma}_{q}^{2}{\Gamma^{2}\left( {2/\beta} \right)}}{\exp \left( {- \frac{{q}^{\beta}}{c\; \sigma_{q}^{\beta}}} \right)}}},} & (29)\end{matrix}$

where β>0 is the shape parameter, Γ(·) is the Gamma function, andc=(Γ(2/β)/Γ(4/β))^(β/2). The GGD reduces to the circular Gaussiandistribution at β=2. Sub-Gaussian and heavy-tailed samples are modeledby β>2 and β<2, respectively. In particular, β=1 corresponds to theLaplacian distribution. The smaller the value of β, the more impulsivethe noise is. In the analysis provided herein, β=0.4 has been adopted asproviding suitable impulsive noise in the GGD model.

In examining the convergence behavior of the ADMM implemented by al_(p)-PCA technique according to the concepts herein, the ACO method isalso implemented to show the comparative performance. For ACO, theobjective function ƒ_(p)(U,V) is minimized over one factored matrixwhile the other factor is fixed. More specifically, at the (k+1)th (k=0,1, . . . ) iteration, U and V are alternatingly minimized:

$\begin{matrix}{{V^{k + 1} = {\arg {\min\limits_{V}{{{U^{k}V} - M}}_{p}^{p}}}}{and}} & (30) \\{U^{k + 1} = {\arg {\min\limits_{U}{{{{{UV}^{k + 1}V} - M}}_{p}^{p}.}}}} & (31)\end{matrix}$

It should be appreciated that the minimizations in equations (30) and(31) are convex for 1≤p≤2 and thus global convergence is guaranteed.Equation (30) is separable into n₂ independent l_(p)-regressionsubproblems:

$\begin{matrix}{{\min\limits_{v_{t} \in {\mathbb{R}}^{r}}{{{U^{k}v_{t}} - m_{t}}}_{p}^{p}},\mspace{14mu} {t = 1},2,\ldots \mspace{14mu},n_{2},} & (32)\end{matrix}$

where v_(t) denotes the tth column of V, which can be solved by theiteratively reweighted least squares (IRLS) algorithm with acomputational complexity of

(n₁r² N_(IRLS)), where N_(IRLS) is the iteration number required for theIRLS to converge. Thus, the complexity for equation (30) is

(n₁n₂r² N_(IRLS)). Since equations (30) and (31) have the samestructure, equation (31) is determined in the same manner as that setforth above with respect to equation (30), with the same complexity. Asa result, the total complexity in the ACO is

(n₁n₂r² N_(IRLS)N_(ACO)), where N_(ACO) is the iteration number requiredfor the ACO method to converge.

The noise-free matrix L of rank r may be generated by multiplying A∈

^(n) ¹ ^(×r) and S∈

^(r×n) ² whose entries satisfy the standard circular Gaussiandistribution. The independent GM noise of variance σ_(q) ² may be addedto all elements of L to produce the observed data matrix M=L+Q. Thesignal-to-noise ratio (SNR) may be defined as

$\begin{matrix}{{{SNR} = \frac{{L}_{F}^{2}}{n_{1}n_{2}\sigma_{q}^{2}}},} & (33)\end{matrix}$

where ∥L∥_(F) ²/(n₁n₂) represents the average signal power. For faircomparison, both ADMM and ACO algorithms apply the same initializationand p=1 is employed in the latter. In the example herein, n₁=20, n₂=50,r=4 and SNR=6 dB are assigned. The values of the objective functions upto the computer round-off precision of the ADMM and ACO schemes areobtained using finite iterations, which are denoted as ƒ_(ADMM)* andƒ_(ACO)*, respectively.

FIG. 4 shows the difference in ACO and ADMM objective functions, namely,ƒ_(ACO)*−ƒ_(ADMM)*, for 50 independent runs. It should be appreciatedthat all differences are positive, indicating that ƒ_(ADMM)*<ƒ_(ACO)*and that both converge to different points. That is, the ACO schemeyields inferior solutions while the ADMM scheme converges to a betterpoint with a smaller objective function value.

FIG. 5 compares the normalized decrease of the ACO and ADMM objectivefunctions, namely, |ƒ(U^(k), V^(k))−ƒ*|/ƒ(U⁰, V⁰) where ƒ* is the globalminimum. In the comparison of FIG. 5, three values of μ, 1, 5 and 10 areemployed for the ADMM scheme. It should be appreciated that ƒ* is verydifficult to obtain and ƒ_(ADMM)* has been used in the example instead.As discussed above, ƒ_(ADMM)* is calculated up to the computer round-offprecision using finite iterations in advance. FIG. 6 shows thenormalized Frobenius norm of the residual in the ADMM scheme, namely,∥R^(k)∥_(F)/∥M∥_(F), versus iteration number. It is expected that theestimated signal subspace Û spans the same range space of A. Thenormalized subspace distance between Û and A is taken as the performancemeasure to evaluate the quality of the subspace estimate, which isdefined as:

$\begin{matrix}{{{{SD}\left( {\hat{U},A} \right)} = \frac{{{\Pi_{U} - \Pi_{A}}}_{F}}{{\Pi_{A}}_{F}}},} & (34)\end{matrix}$

where Π_(A)=A(A^(H)A)⁻¹A^(H) is the projection matrix onto the columnspace of A. A smaller value of SD(Û,A) indicates a more accurateestimate of U. In the ideal case when Û and A span the same columnspace, the subspace distance is zero. FIG. 7 compares the subspacedistance versus iteration number and it should be appreciated thatglobal convergence is not guaranteed for both algorithms. It can be seenthat the ACO scheme is inferior to the ADMM scheme as the ACO scheme haslarger objective function value and subspace distance, although it has arapid decreasing rate at the initial stage.

It can be seen from FIGS. 5-7 that the ADMM scheme with different valuesof μ converges to the same solution in the steady state although theconvergence rates differ. That is, the penalty parameter only affectsthe convergence speed. The value of μ=1 corresponds to the fastest rateat the transient but it slows down later, implying that the selection ofμ is quite flexible.

It can be seen from FIG. 6 that several tens of iterations are employedfor the ADMM scheme to attain a normalized residual of 10⁻³ to 10⁻⁴.This order-of-magnitude of iterations is also enough for the convergenceof subspace distance, as may be seen in FIG. 7.

To compare the implementation complexity, the runtimes of the ADMM andACO schemes have been measured using MATLAB on a computer with a 3.2 GHzCPU and 4 GB memory. The simulation settings remain unchanged exceptthat n₁ and n₂ vary. The CPU times (in seconds) with different sets ofn₁ and n₂, which are based on an average of 20 independent runs, arelisted in the table below. The stopping parameter used in the ADMMsimulations is δ=10⁻³. This value is also used as the tolerance for theACO simulations (i.e., the ACO terminates when the relative change ofthe objective function is less than 10⁻³). It can be seen from theresults represented in the table below that the ADMM scheme is morecomputationally efficient than the ACO scheme, particularly for largerscale problems. Combining the results in FIGS. 4-7, it can beappreciated that the ADMM scheme is superior to the ACO scheme in termsof robust subspace estimation performance and computational complexity.

n₁ = 20 n₁ = 40 n₁ = 80 n₁ = 200 n₁ = 1000 n₂ = 50 n₂ = 100 n₂ = 200 n₂= 500 n₂ = 2000 ADMM 1.37 × 3.21 × 10⁻¹ 6.52 × 10⁻¹ 3.05 × 10⁰ 1.06 ×10² 10⁻¹ ACO 1.10 × 10¹ 2.78 × 10¹ 1.15 × 10² 1.25 × 10³ 1.28 × 10⁵

In addition to the aforementioned advantages of l_(p)-PCA techniques, al_(p)-PCA technique implemented in accordance with the concepts hereinprovides robust direction-of-arrival (DOA) estimation. Targetlocalization using measurements from an array of spatially separatedsensors has been one of the central problems in numerous applicationareas including radar, sonar, global positioning system, wirelesscommunications, multimedia, and sensor network. The positions of thetargets can be represented by their directions-of-arrival (DOAs). Thesubspace based DOA estimation approach is a standard choice among manyDOA estimators because it strikes a good balance between estimationaccuracy and computational complexity. The underlying idea of such asubspace methodology is to separate the data into signal and noisesubspaces. It is usually achieved by SVD of the observed data matrix oreigenvalue decomposition (EVD) of the sample covariance matrix, and theparameters of interest are then extracted from the correspondingeigenvectors, singular vectors, eigenvalues or singular values.

In an example comprising a uniform linear array (ULA) of in sensors withinter-element spacing d, the ULA receives r far-field and narrowbandsources emitting plane waves. Where the first sensor is assigned as thereference, the complex baseband signal received by the ith (i=1, 2, . .. , m) sensor may be modeled as:

$\begin{matrix}{{{y_{i}(t)} = {{\sum\limits_{j = 1}^{r}\; {{s_{j}(t)}e^{i\; 2{\pi {({i - 1})}}{\sin {(\theta_{j})}}{d/\eta}}}} + {\xi_{i}(t)}}},} & (35)\end{matrix}$

where r

=√{square root over (−1)}, t is the discrete-time index, s_(j)(t) is thejth (j=1, 2, . . . , r) source signal with θ_(j), being its DOA,ξ_(i)(t) is the additive non-Gaussian noise at the ith sensor, and η isthe wavelength of the signal. To avoid the phase ambiguity d≤η/2 is usedin this example. According to commonly used assumptions, the number ofsources is less than the number of sensors, namely, r<m, r is known, andthe zero-mean sources are mutually independently with each other, whilethe noises {ξ_(i)(t)}_(i=1) ^(m) are spatially uncorrelated andtemporally white, and statistically independent of the sources.

Stacking the output of all the sensors in a vector y_(t)=[y₁(t) y₂(t) .. . y_(m)(t)]^(T)∈

^(m), the matrix-vector formulation of equation (35) is then:

y _(t) =As _(t)+τ_(t),  (36)

where s_(t)=[s₁(t) s₂(t) . . . s_(r)(t)]^(T)∈

^(r) is the source vector, ξ_(t)=[ξ₁(t) ξ₂(t) . . . ξ_(m)(t)]^(T)∈

^(m) is the noise vector, and A∈

^(m×r) is the array manifold matrix. The array manifold matrix has thefollowing form:

A=[a(θ₁)a(θ₂) . . . a(θ_(r))],  (37)

with a(θ) being the steering vector:

a(θ)=[1e

^(t2π sin(θ)d/η) . . . e

^(t2π(m−1)sin(θ)d/η)]^(T).  (38)

In subspace based DOA estimation, the DOAs of the r sources may beestimated based on n snapshots. The n snapshots may be collected in thefollowing matrix Y:

Y

[y ₁ y ₂ . . . y _(n)]=AS+Θ∈

^(m×n).  (39)

where S=[s₁ s₂ . . . s_(n)]∈

^(r×n) and Θ=[ξ₁ ξ₂ . . . ξ_(n)]∈

^(m×n) are the source and noise data matrices, respectively. The A is offull column rank and S is of full row rank. In the noiseless case ofΘ=0, rank(Y)=r<m due to rank(A)=rank(S)=r. As a result, an accurateestimate of AS is given by the truncated SVD according to equations(6)-(8):

Ŷ=Ū{circumflex over (V)},Û=G _(s) ,{circumflex over (V)}=Σ _(s) H _(s)^(H)  (40)

It should be appreciated that, in array processing, the range spacespanned by G_(s)=[g₁ g₂ . . . g_(r)]∈

^(m×r) is called the signal subspace while its orthogonal complement,denoted by G_(n)∈

^(m×(m−r)), is referred to as the noise subspace. The projections ontothe signal and noise subspaces are related by G_(n)G_(n)^(H)=I−G_(s)G_(s) ^(H) where I is the identity matrix. DOA estimationcan be performed using either the signal or noise subspace. Applying themultiple signal classification (MUSIC) subspace based method, the DOAestimates are given by the r peaks of the following spatial spectrum:

$\begin{matrix}{{{P_{MUSIC}(\theta)} = \frac{1}{{\alpha^{H}(\theta)}\left( {I - {\hat{U}{\hat{U}}^{H}}} \right){\alpha (\theta)}}},} & (41)\end{matrix}$

which is based on the fact that the steering vectors of the r sourcesare orthogonal to the noise subspace. It should be understood, however,that other subspace approaches, including the estimation of signalparameters via rotational invariance techniques (ESPRIT) andprincipal-singular-vector utilization modal analysis (PUMA), can also beapplied after Û is obtained.

In the presence of impulsive non-Gaussian Θ, a l_(p)-PCA techniqueimplemented in accordance with the concepts herein may be employed toprovide robust subspace factorization for computing Û. In thesimulations of the example herein, in addition to the ADMM scheme ofembodiments of a l_(p)-PCA technique, the results of the l₁-MUSIC basedon ACO, conventional MUSIC, fractional low-order moment MUSIC (FLOM),robust covariance based MUSIC (ROC), zero-memory non-linearity (ZMNL),and MM methods, as well as the Cramér-Rao bound (CRB) are also provided.The ADMM and ACO algorithms directly find a robust estimated subspacewhile the others robustly estimate the covariance and then employ theSVD or EVD to compute the subspace. Prior to applying the MUSIC, theZMNL method uses a Gaussian-tailed ZMNL function to clip outliers. Inthe MM method, the covariance matrix is first robustified via MMestimation and then MUSIC is exploited for DOA estimation, while theFLOM and ROC algorithms adopt the fractional lower-order moments in thesample covariance computation. For the purpose of fair comparison, p=1is set as the lower order in the FLOM and ROC schemes, which align withthe ADMM and ACO methods. A ULA with inter-sensor spacing being half awavelength is considered. The CRB for the DOAs θ=[θ₁ θ₂ . . . θ_(r)]^(T)under non-Gaussian noise is given by:

$\begin{matrix}{{{{CRB}(\theta)} = {\frac{1}{I_{c}}{diag}\left\{ {\sum\limits_{i = 1}^{n}\; {{Re}\left( {S_{i}^{H}{B^{H}(\theta)}\Pi_{A}^{\bot}{B(\theta)}S_{i}} \right)}} \right\}}},} & (42)\end{matrix}$

where S_(t)=diag(s₁(t), s₂(t), . . . , s_(r)(t)) is a diagonal matrix,B(θ)=[b(θ₁) b(θ₂) . . . b(θ_(r))] with b(θ)=da(θ)/dθ, Π_(A) ^(⊥)=I−Π_(A)is the projection onto the orthogonal complementary space of A, and

$\begin{matrix}{I_{c} = {\pi {\int_{0}^{\infty}{\frac{\left( {p_{\xi}^{\prime}(\rho)} \right)^{2}}{p_{\xi}(\rho)}\rho \; d\; \rho}}}} & (43)\end{matrix}$

with ρ=|ξ| being the modulus of the complex variable ξ and p′_(ξ)(ρ) thederivative of p_(ξ)(ρ). The PDF of the noise affects the CRB onlythrough the scalar I_(c). The CRBs in the presence of GM and GGD noisesmay be numerically computed using equations (42) and (43).

Monte Carlo trials have been carried out to evaluate the performance ofthe DOA estimators. The root mean square errors (RMSEs) of the subspacedistance and DOAs are taken as the performance measures, which aredefined as

$\begin{matrix}{{{{RMSE}\left( \hat{U} \right)} = {\frac{1}{{\Pi_{A}}_{F}}\sqrt{\frac{1}{N}{\sum\limits_{i = 1}^{N}\; {{\Pi_{{\hat{U}}^{i}} - \Pi_{A}}}_{F}^{2}}}}}{and}} & (44) \\{{{{RMSE}\left( {\hat{\theta}}_{j} \right)} = \sqrt{\frac{1}{N}{\sum\limits_{i = 1}^{N}\; \left( {{\hat{\theta}}_{j}^{i} - \theta_{j}} \right)^{2}}}},{j = 1},2,\ldots \mspace{14mu},r,} & (45)\end{matrix}$

where N is the number of Monte Carlo trials, Ū^(i) and θ _(j) ^(i) arethe estimated subspace and DOA of the jth source in the ith trial,respectively. The root-MUSIC is employed to calculate the DOA parametersafter the signal subspace is determined to avoid grid search over thespectrum. In the example, the emitting sources are two independentquadrature phase-shift keying (QPSK) signals with equal power with DOAsθ₁=−8° and θ₂=10°. This means that the target rank is r=2. The numbersof sensors and snapshots are m=6 and n=100, while N=200. FIGS. 8 and 9show the RMSEs of subspace distance and DOA estimate for θ₁ versus SNRin GM noise, respectively. FIGS. 10 and 11 show the results for GGDnoise. As the performance for θ₂ is similar, the corresponding RMSEs arenot shown.

It can be seen from FIGS. 8-11 that the conventional MUSIC scheme is notrobust in the presence of impulsive noise. In contrast, as can also beseen from FIGS. 8-11, the ADMM scheme has the best performance, and issuperior to the l₁-MUSIC scheme. It should be understood that thel₁-MUSIC method produces smaller RMSEs because it uses multiple randominitializations and the best result is selected. However, using multipleinitializations requires running the ACO many times. For faircomparison, all the methods employ the same initial value. MM and ZMNLmethods also show good robustness to outliers but the latter suffersperformance saturation as SNR increases. This is because it generallydestroys the low-rank structure of the signal subspace, which leads to aperformance saturation or even degradation. Although the FLOM and ROCmethods outperform the conventional MUSIC algorithm, they are inferiorto the ADMM, l₁-MUSIC, MM, and ZMNL schemes.

The following example of a l_(p)-PCA technique employed in theapplication of texture impainting illustrates results using real-worlddata. It should be appreciated that many natural and man-made imagesinclude highly regular textures, corresponding to low-rank matrices. Intexture impainting, the task is to recover the background texture whichis sparsely occluded by untextured components. That is, for the observedimage (e.g., provided as an observed data matrix), textured anduntextured components may be modeled as M, L and Q in equation (2),respectively.

In the example herein, the ADMM scheme of a l_(p)-PCA technique ofembodiments with r=2 is applied to an image of a chessboard with 377×370pixels (shown in FIG. 12A). The results of operation of the l_(p)-PCAtechnique are shown in FIGS. 12B and 12C, wherein FIG. 12B shows therecovered background texture (low-rank component) and FIG. 12C shows therecovered untextured components (sparse component). It can be seen fromFIGS. 12A-12C that the ADMM scheme flawlessly recovers the checkerboardfrom the pieces. It should be appreciated from the foregoing that thel_(p)-PCA technique of embodiments directly applies to the real-valuedobservations because real number is merely a special case of complexnumber.

The following examples of a l_(p)-PCA technique employed in theapplication of video background extraction further illustrate resultsusing real-world data. In this application, the background scene isextracted from a number of video frames. Converting each frame of avideo as a column of a matrix, the resultant matrix is of low-rankintrinsically due to the correlation between frames. In the presence offoreground objects especially in busy scenes, every frame may containsome anomalies. Accordingly, the background may be modeled asapproximately low rank (e.g., the background is L and foreground is Qaccording to the model of equation (2)). Foreground objects such asmoving cars or walking pedestrians, generally occupy only a fraction ofthe image pixels, and thus may be treated as sparse outliers. If thebackground is invariant, the rank may be set as r=1. Otherwise the rankmay be selected slightly larger than one to accommodate small changes inthe background. That is.

The examples of video background extraction herein consider two videodatasets available from CDNET. In particular, the video datasets of theexamples are “backdoor” comprising a video sample of 2000 color frameswith prevalent hard and soft shadow intermittent shades and“streetlight” comprising a video sample of 3200 color frames containingbackground objects stopping for a short while and then moving away. Inthe examples herein, for both datasets, the first 200 frames of thevideo samples were selected and converted to grayscale versions, asrepresented in FIGS. 13A and 14A, respectively. All frames of theseexamples have a size of 240×320, corresponding to 76800 pixels. Thus,the observed data matrix constructed from each video is M∈

^(76800×200) where n₁=76800 and n₂=200. As the two videos haverelatively static backgrounds, r=1 is selected in both cases. FIGS. 13Band 13C and 14B and 14C show the results of three representative framesin the “backdoor” and “streetLight” datasets, respectively. It can beseen from these representative frames the ADMM scheme successfullyseparates the background from the foreground.

The foregoing examples demonstrate that the ADMM scheme of a l_(p)-PCAtechnique of embodiments successfully extracts the low-rank componentsin challenging applications. In particular, the example applicationsdemonstrate robust signal subspace estimation which leads to accuratesource localization in the presence of impulsive noise, separation ofthe textured image from the untextured components in texture impainting,and separation of background from foreground in video frames where theformer and latter are modelled as low-rank matrix and sparse matrix,respectively.

Although the present invention and its advantages have been described indetail, it should be understood that various changes, substitutions andalterations can be made herein without departing from the spirit andscope of the invention as defined by the appended claims. Moreover, thescope of the present application is not intended to be limited to theparticular embodiments of the process, machine, manufacture, compositionof matter, means, methods and steps described in the specification. Asone of ordinary skill in the art will readily appreciate from thedisclosure of the present invention, processes, machines, manufacture,compositions of matter, means, methods, or steps, presently existing orlater to be developed that perform substantially the same function orachieve substantially the same result as the corresponding embodimentsdescribed herein may be utilized according to the present invention.Accordingly, the appended claims are intended to include within theirscope such processes, machines, manufacture, compositions of matter,means, methods, or steps.

What is claimed is:
 1. A method for low-rank approximation of anobserved data matrix, the method comprising: obtaining, by aprocessor-based system, the observed data matrix; performing, by logicof the processor-based system, factorization of the observed data matrixin l_(p)-norm space, wherein p<2; and providing, by the processor-basedsystem from a result of the l_(p)-norm space factorization of theobserved data matrix, a low-rank approximation comprising principalcomponents extracted from the observed data matrix.
 2. The method ofclaim 1, wherein obtaining the observed data matrix comprises: derivingthe observed data matrix from a signal selected from the groupconsisting of a data signal, a graphical signal, an audio signal, avideo signal, and a multimedia signal, wherein the observed data matrixcomprises a matrix of a plurality of data points representing thesignal.
 3. The method of claim 2, wherein providing the low-rankapproximation comprises: providing the low-rank approximation for use atleast one application selected from the group consisting of asurveillance application, a machine learning application, a web searchapplication, a bioinformatics application, a dimensionality reductionapplication, and a signal processing application.
 4. The method of claim1, wherein the observed data matrix comprises presence of at least oneof impulsive noise, outliers, or sparse features.
 5. The method of claim1, wherein 1≤p<2.
 6. The method of claim 5, wherein performing thefactorization of the observed data matrix in the l_(p)-norm spacecomprises: performing factorization of the observed data matrix inl₁-norm space, wherein p=1.
 7. The method of claim 1, wherein performingthe factorization of the observed data matrix in the l_(p)-norm spacecomprises: applying alternating direction method of multipliers (ADMM)to solve subspace decomposition of low-rank matrix factorization withrespect to the observed data matrix.
 8. The method of claim 7, whereinapplying the ADMM to solve subspace decomposition of low-rank matrixfactorization comprises: minimizing a residual matrix in the subspacefactorization of the observed data matrix.
 9. The method of claim 7,wherein applying the ADMM to solve subspace decomposition of low-rankmatrix factorization comprises: solving a l₂-subspace decomposition; andcalculating a proximity operator of the l_(p)-norm.
 10. The method ofclaim 9, wherein solving the l₂-subspace decomposition comprises: usingleast Frobenius norm solved by truncated singular value decomposition(SVD).
 11. The method of claim 9, wherein calculating the proximityoperator of the l_(p)-norm comprises: using a closed-formsoft-thresholding operator for complex variables.
 12. The method ofclaim 9, wherein solving the l₂-subspace decomposition and calculatingto proximity operator of the l_(p)-norm are performed in each iterativestep of the ADMM.
 13. A system for low-rank approximation of an observeddata matrix, the system comprising: one or more data processors; and oneor more non-transitory computer-readable storage media containingprogram code configured to cause the one or more data processors toperform operations including: obtain the observed data matrix; performfactorization of the observed data matrix in l_(p)-norm space, whereinp<2; and provide, from a result of the l_(p)-norm space factorization ofthe observed data matrix, a low-rank approximation comprising principalcomponents extracted from the observed data matrix.
 14. The system ofclaim 13, wherein the program code configured to cause the one or moredata processors to obtain the observed data matrix further causes theone or more data processors to: derive the observed data matrix from asignal selected from the group consisting of a data signal, a graphicalsignal, an audio signal, a video signal, and a multimedia signal,wherein the observed data matrix comprises a matrix of a plurality ofdata points representing the signal.
 15. The system of claim 14, whereinthe program code configured to cause the one or more data processors toprovide the low-rank approximation further causes the one or more dataprocessors to: provide the low-rank approximation for use at least oneapplication selected from the group consisting of a surveillanceapplication, a machine learning application, a web search application, abioinformatics application, a dimensionality reduction application, anda signal processing application.
 16. The system of claim 13, wherein theobserved data matrix comprises presence of at least one of impulsivenoise, outliers, or sparse features.
 17. The system of claim 13, whereinthe program code configured to cause the one or more data processors toperform the factorization of the observed data matrix in the l_(p)-normspace further causes the one or more data processors to: performfactorization of the observed data matrix in l₁-norm space, wherein p=1.18. The system of claim 13, wherein the program code configured to causethe one or more data processors to perform the factorization of theobserved data matrix in the l_(p)-norm space further causes the one ormore data processors to: apply alternating direction method ofmultipliers (ADMM) to solve subspace decomposition of low-rank matrixfactorization with respect to the observed data matrix.
 19. The systemof claim 18, wherein the program code configured to cause the one ormore data processors to apply the ADMM to solve subspace decompositionof low-rank matrix factorization further causes the one or more dataprocessors to: minimize a residual matrix in the subspace factorizationof the observed data matrix.
 20. The system of claim 18, wherein theprogram code configured to cause the one or more data processors toapply the ADMM to solve subspace decomposition of low-rank matrixfactorization further causes the one or more data processors to: solve al₂-subspace decomposition; and calculate a proximity operator of thel_(p)-norm.
 21. The system of claim 20, wherein the program codeconfigured to cause the one or more data processors to solve thel₂-subspace decomposition further causes the one or more data processorsto: use least Frobenius norm solved by truncated singular valuedecomposition (SVD).
 22. The system of claim 20, wherein the programcode configured to cause the one or more data processors to calculatethe proximity operator of the l_(p)-norm further causes the one or moredata processors to: use a closed-form soft-thresholding operator forcomplex variables.
 23. A method for low-rank approximation of anobserved data matrix, the method comprising: obtaining, by aprocessor-based system, the observed data matrix; performing, by logicof the processor-based system, factorization of the observed data matrixin l₁-norm space by applying alternating direction method of multipliers(ADMM) to solve subspace decomposition of low-rank matrix factorizationwith respect to the observed data matrix; and providing, by theprocessor-based system from a result of the l₁-norm space factorizationof the observed data matrix, a low-rank approximation comprisingprincipal components extracted from the observed data matrix.
 24. Themethod of claim 23, wherein applying the ADMM to solve subspacedecomposition of low-rank matrix factorization comprises: minimizing aresidual matrix in the subspace factorization of the observed datamatrix.
 25. The method of claim 23, wherein applying the ADMM to solvesubspace decomposition of low-rank matrix factorization comprises:performing iterative steps of the ADMM, wherein each iterative step ofthe ADMM includes: solving a l₂-subspace decomposition using leastFrobenius norm solved by truncated singular value decomposition (SVD);and calculating a proximity operator of the l_(p)-norm using aclosed-form soft-thresholding operator for complex variables.